Search | VHL Regional Portal

1.

A saturated map of common genetic variants associated with human height.

Yengo, Loïc; Vedantam, Sailaja; Marouli, Eirini; Sidorenko, Julia; Bartell, Eric; Sakaue, Saori; Graff, Marielisa; Eliasen, Anders U; Jiang, Yunxuan; Raghavan, Sridharan; Miao, Jenkai; Arias, Joshua D; Graham, Sarah E; Mukamel, Ronen E; Spracklen, Cassandra N; Yin, Xianyong; Chen, Shyh-Huei; Ferreira, Teresa; Highland, Heather H; Ji, Yingjie; Karaderi, Tugce; Lin, Kuang; Lüll, Kreete; Malden, Deborah E; Medina-Gomez, Carolina; Machado, Moara; Moore, Amy; Rüeger, Sina; Sim, Xueling; Vrieze, Scott; Ahluwalia, Tarunveer S; Akiyama, Masato; Allison, Matthew A; Alvarez, Marcus; Andersen, Mette K; Ani, Alireza; Appadurai, Vivek; Arbeeva, Liubov; Bhaskar, Seema; Bielak, Lawrence F; Bollepalli, Sailalitha; Bonnycastle, Lori L; Bork-Jensen, Jette; Bradfield, Jonathan P; Bradford, Yuki; Braund, Peter S; Brody, Jennifer A; Burgdorf, Kristoffer S; Cade, Brian E; Cai, Hui.

Nature ; 610(7933): 704-712, 2022 10.

Article in English | MEDLINE | ID: mdl-36224396

ABSTRACT

Common single-nucleotide polymorphisms (SNPs) are predicted to collectively explain 40-50% of phenotypic variation in human height, but identifying the specific variants and associated regions requires huge sample sizes1. Here, using data from a genome-wide association study of 5.4 million individuals of diverse ancestries, we show that 12,111 independent SNPs that are significantly associated with height account for nearly all of the common SNP-based heritability. These SNPs are clustered within 7,209 non-overlapping genomic segments with a mean size of around 90 kb, covering about 21% of the genome. The density of independent associations varies across the genome and the regions of increased density are enriched for biologically relevant genes. In out-of-sample estimation and prediction, the 12,111 SNPs (or all SNPs in the HapMap 3 panel2) account for 40% (45%) of phenotypic variance in populations of European ancestry but only around 10-20% (14-24%) in populations of other ancestries. Effect sizes, associated regions and gene prioritization are similar across ancestries, indicating that reduced prediction accuracy is likely to be explained by linkage disequilibrium and differences in allele frequency within associated regions. Finally, we show that the relevant biological pathways are detectable with smaller sample sizes than are needed to implicate causal genes and variants. Overall, this study provides a comprehensive map of specific genomic regions that contain the vast majority of common height-associated variants. Although this map is saturated for populations of European ancestry, further research is needed to achieve equivalent saturation in other ancestries.

Subject(s)

Body Height , Chromosome Mapping , Polymorphism, Single Nucleotide , Humans , Body Height/genetics , Gene Frequency/genetics , Genome, Human/genetics , Genome-Wide Association Study , Haplotypes/genetics , Linkage Disequilibrium/genetics , Polymorphism, Single Nucleotide/genetics , Europe/ethnology , Sample Size , Phenotype

2.

A library of induced pluripotent stem cells from clinically well-characterized, diverse healthy human individuals.

Schaniel, Christoph; Dhanan, Priyanka; Hu, Bin; Xiong, Yuguang; Raghunandan, Teeya; Gonzalez, David M; Dariolli, Rafael; D'Souza, Sunita L; Yadaw, Arjun S; Hansen, Jens; Jayaraman, Gomathi; Mathew, Bino; Machado, Moara; Berger, Seth I; Tripodi, Joseph; Najfeld, Vesna; Garg, Jalaj; Miller, Marc; Surlyn, Colleen S; Michelis, Katherine C; Tangirala, Neelima C; Weerahandi, Himali; Thomas, David C; Beaumont, Kristin G; Sebra, Robert; Mahajan, Milind; Schadt, Eric; Vidovic, Dusica; Schürer, Stephan C; Goldfarb, Joseph; Azeloglu, Evren U; Birtwistle, Marc R; Sobie, Eric A; Kovacic, Jason C; Dubois, Nicole C; Iyengar, Ravi.

Stem Cell Reports ; 16(12): 3036-3049, 2021 12 14.

Article in English | MEDLINE | ID: mdl-34739849

ABSTRACT

A library of well-characterized human induced pluripotent stem cell (hiPSC) lines from clinically healthy human subjects could serve as a useful resource of normal controls for in vitro human development, disease modeling, genotype-phenotype association studies, and drug response evaluation. We report generation and extensive characterization of a gender-balanced, racially/ethnically diverse library of hiPSC lines from 40 clinically healthy human individuals who range in age from 22 to 61 years. The hiPSCs match the karyotype and short tandem repeat identities of their parental fibroblasts, and have a transcription profile characteristic of pluripotent stem cells. We provide whole-genome sequencing data for one hiPSC clone from each individual, genomic ancestry determination, and analysis of mendelian disease genes and risks. We document similar transcriptomic profiles, single-cell RNA-sequencing-derived cell clusters, and physiology of cardiomyocytes differentiated from multiple independent hiPSC lines. This extensive characterization makes this hiPSC library a valuable resource for many studies on human biology.

Subject(s)

Health , Induced Pluripotent Stem Cells/cytology , Adult , Calcium Signaling , Cell Differentiation , Cell Line , Clone Cells , Ethnicity , Female , Gene Expression Profiling , Gene Expression Regulation , Genetic Predisposition to Disease , Genetic Variation , Heart Atria/cytology , Heart Ventricles/cytology , Humans , Male , Middle Aged , Myocytes, Cardiac/cytology , Myocytes, Cardiac/metabolism , Risk Factors , Young Adult

3.

Genome-wide homozygosity and risk of four non-Hodgkin lymphoma subtypes.

Moore, Amy; Machiela, Mitchell J; Machado, Moara; Wang, Sophia S; Kane, Eleanor; Slager, Susan L; Zhou, Weiyin; Carrington, Mary; Lan, Qing; Milne, Roger L; Birmann, Brenda M; Adami, Hans-Olov; Albanes, Demetrius; Arslan, Alan A; Becker, Nikolaus; Benavente, Yolanda; Bisanzi, Simonetta; Boffetta, Paolo; Bracci, Paige M; Brennan, Paul; Brooks-Wilson, Angela R; Canzian, Federico; Caporaso, Neil; Clavel, Jacqueline; Cocco, Pierluigi; Conde, Lucia; Cox, David G; Cozen, Wendy; Curtin, Karen; De Vivo, Immaculata; de Sanjose, Silvia; Foretova, Lenka; Gapstur, Susan M; Ghesquières, Hervè; Giles, Graham G; Glenn, Martha; Glimelius, Bengt; Gao, Chi; Habermann, Thomas M; Hjalgrim, Henrik; Jackson, Rebecca D; Liebow, Mark; Link, Brian K; Maynadie, Marc; McKay, James; Melbye, Mads; Miligi, Lucia; Molina, Thierry J; Monnereau, Alain; Nieters, Alexandra.

J Transl Genet Genom ; 5: 200-217, 2021.

Article in English | MEDLINE | ID: mdl-34622145

ABSTRACT

AIM: Recessive genetic variation is thought to play a role in non-Hodgkin lymphoma (NHL) etiology. Runs of homozygosity (ROH), defined based on long, continuous segments of homozygous SNPs, can be used to estimate both measured and unmeasured recessive genetic variation. We sought to examine genome-wide homozygosity and NHL risk. METHODS: We used data from eight genome-wide association studies of four common NHL subtypes: 3061 chronic lymphocytic leukemia (CLL), 3814 diffuse large B-cell lymphoma (DLBCL), 2784 follicular lymphoma (FL), and 808 marginal zone lymphoma (MZL) cases, as well as 9374 controls. We examined the effect of homozygous variation on risk by: (1) estimating the fraction of the autosome containing runs of homozygosity (FROH); (2) calculating an inbreeding coefficient derived from the correlation among uniting gametes (F3); and (3) examining specific autosomal regions containing ROH. For each, we calculated beta coefficients and standard errors using logistic regression and combined estimates across studies using random-effects meta-analysis. RESULTS: We discovered positive associations between FROH and CLL (ß = 21.1, SE = 4.41, P = 1.6 × 10-6) and FL (ß = 11.4, SE = 5.82, P = 0.02) but not DLBCL (P = 1.0) or MZL (P = 0.91). For F3, we observed an association with CLL (ß = 27.5, SE = 6.51, P = 2.4 × 10-5). We did not find evidence of associations with specific ROH, suggesting that the associations observed with FROH and F3 for CLL and FL risk were not driven by a single region of homozygosity. CONCLUSION: Our findings support the role of recessive genetic variation in the etiology of CLL and FL; additional research is needed to identify the specific loci associated with NHL risk.

4.

Tracing the Distribution of European Lactase Persistence Genotypes Along the Americas.

Guimarães Alves, Ana Cecília; Sukow, Natalie Mary; Adelman Cipolla, Gabriel; Mendes, Marla; Leal, Thiago P; Petzl-Erler, Maria Luiza; Lehtonen Rodrigues Souza, Ricardo; Rainha de Souza, Ilíada; Sanchez, Cesar; Santolalla, Meddly; Loesch, Douglas; Dean, Michael; Machado, Moara; Moon, Jee-Young; Kaplan, Robert; North, Kari E; Weiss, Scott; Barreto, Mauricio L; Lima-Costa, M Fernanda; Guio, Heinner; Cáceres, Omar; Padilla, Carlos; Tarazona-Santos, Eduardo; Mata, Ignacio F; Dieguez, Elena; Raggio, Víctor; Lescano, Andres; Tumas, Vitor; Borges, Vanderci; Ferraz, Henrique B; Rieder, Carlos R; Schumacher-Schuh, Artur; Santos-Lobato, Bruno L; Chana-Cuevas, Pedro; Fernandez, William; Arboleda, Gonzalo; Arboleda, Humberto; Arboleda-Bustos, Carlos E; O'Connor, Timothy D; Beltrame, Marcia Holsbach; Borda, Victor.

Front Genet ; 12: 671079, 2021.

Article in English | MEDLINE | ID: mdl-34630506

ABSTRACT

In adulthood, the ability to digest lactose, the main sugar present in milk of mammals, is a phenotype (lactase persistence) observed in historically herder populations, mainly Northern Europeans, Eastern Africans, and Middle Eastern nomads. As the -13910∗T allele in the MCM6 gene is the most well-characterized allele responsible for the lactase persistence phenotype, the -13910C > T (rs4988235) polymorphism is commonly evaluated in lactase persistence studies. Lactase non-persistent adults may develop symptoms of lactose intolerance when consuming dairy products. In the Americas, there is no evidence of the consumption of these products until the arrival of Europeans. However, several American countries' dietary guidelines recommend consuming dairy for adequate human nutrition and health promotion. Considering the extensive use of dairy and the complex ancestry of Pan-American admixed populations, we studied the distribution of -13910C > T lactase persistence genotypes and its flanking haplotypes of European origin in 7,428 individuals from several Pan-American admixed populations. We found that the -13910∗T allele frequency in Pan-American admixed populations is directly correlated with allele frequency of the European sources. Moreover, we did not observe any overrepresentation of European haplotypes in the -13910C > T flanking region, suggesting no selective pressure after admixture in the Americas. Finally, considering the dominant effect of the -13910∗T allele, our results indicate that Pan-American admixed populations are likely to have higher frequency of lactose intolerance, suggesting that general dietary guidelines deserve further evaluation across the continent.

5.

Admixture/fine-mapping in Brazilians reveals a West African associated potential regulatory variant (rs114066381) with a strong female-specific effect on body mass and fat mass indexes.

Scliar, Marilia O; Sant'Anna, Hanaisa P; Santolalla, Meddly L; Leal, Thiago P; Araújo, Nathalia M; Alvim, Isabela; Borda, Victor; Magalhães, Wagner C S; Gouveia, Mateus H; Lyra, Ricardo; Machado, Moara; Michelin, Lucas; Rodrigues, Maíra R; Araújo, Gilderlanio S; Kehdy, Fernanda S G; Zolini, Camila; Peixoto, Sérgio V; Luizon, Marcelo R; Lobo, Francisco; Naslavsky, Michel S; Yamamoto, Guilherme L; Duarte, Yeda A O; Hansen, Matthew E B; Norris, Shane A; Gilman, Robert H; Guio, Heinner; Hsing, Ann W; Mbulaiteye, Sam M; Mensah, James; Dutil, Julie; Yeager, Meredith; Yeboah, Edward; Tishkoff, Sarah A; Choudhury, Ananyo; Ramsay, Michele; Passos-Bueno, Maria Rita; Zatz, Mayana; O Connor, Timothy D; Pereira, Alexandre C; Barreto, Mauricio L; Lima-Costa, Maria Fernanda; Horta, Bernardo L; Tarazona-Santos, Eduardo.

Int J Obes (Lond) ; 45(5): 1017-1029, 2021 05.

Article in English | MEDLINE | ID: mdl-33633342

ABSTRACT

BACKGROUND/OBJECTIVES: Admixed populations are a resource to study the global genetic architecture of complex phenotypes, which is critical, considering that non-European populations are severely underrepresented in genomic studies. Here, we study the genetic architecture of BMI in children, young adults, and elderly individuals from the admixed population of Brazil. SUBJECTS/METHODS: Leveraging admixture in Brazilians, whose chromosomes are mosaics of fragments of Native American, European, and African origins, we used genome-wide data to perform admixture mapping/fine-mapping of body mass index (BMI) in three Brazilian population-based cohorts from Northeast (Salvador), Southeast (Bambuí), and South (Pelotas). RESULTS: We found significant associations with African-associated alleles in children from Salvador (PALD1 and ZMIZ1 genes), and in young adults from Pelotas (NOD2 and MTUS2 genes). More importantly, in Pelotas, rs114066381, mapped in a potential regulatory region, is significantly associated only in females (p = 2.76e-06). This variant is rare in Europeans but with frequencies of ~3% in West Africa and has a strong female-specific effect (95% CI: 2.32-5.65 kg/m2 per each A allele). We confirmed this sex-specific association and replicated its strong effect for an adjusted fat mass index in the same Pelotas cohort, and for BMI in another Brazilian cohort from São Paulo (Southeast Brazil). A meta-analysis confirmed the significant association. Remarkably, we observed that while the frequency of rs114066381-A allele ranges from 0.8 to 2.1% in the studied populations, it attains ~9% among women with morbid obesity from Pelotas, São Paulo, and Bambuí. The effect size of rs114066381 is at least five times higher than the FTO SNPs rs9939609 and rs1558902, already emblematic for their high effects. CONCLUSIONS: We identified six candidate SNPs associated with BMI. rs114066381 stands out for its high effect that was replicated and its high frequency in women with morbid obesity. We demonstrate how admixed populations are a source of new relevant phenotype-associated genetic variants.

Subject(s)

Body Mass Index , Genetics, Population , Polymorphism, Single Nucleotide , Aged , Aged, 80 and over , Alleles , Brazil , Child , Child, Preschool , Chromosome Mapping , Female , Humans , Male , Middle Aged , Phenotype , Regulatory Sequences, Nucleic Acid , Sex Factors , Young Adult

6.

Admixture/fine-mapping in Brazilians reveals a West African associated potential regulatory variant (rs114066381) with a strong female-specific effect on body mass- and fat mass- indexes

Scliar, Marilia O.; Sant Anna, Hanaisa P.; Santolalla, Meddly L.; Leal, Thiago P.; Araújo, Nathalia M.; Alvim, Isabela; Borda, Victor; Magalhães, Wagner C. S.; Gouveia, Mateus H.; Lyra, Ricardo; Machado, Moara; Michelin, Lucas; Rodrigues, Maíra R.; Araújo, Gilderlanio S.; Kehdy, Fernanda S. G.; Zolini, Camila; Peixoto, Sérgio Viana; Luizon, Marcelo; Lobo, Francisco; Naslavsky, Michel S.; Yamamoto, Guilherme L.; Duarte, Yeda A. O.; Hansen, Matthew E. B.; Norris, Shane A.; Gilman, Robert H.; Guio, Heinner; Hsing, Ann W.; Mbulaiteye, Sam M.; Mensah, James; Dutil, Julie; Yeager, Meredith; Yeboah, Edward; Tishkoff, Sarah A.; Choudhury, Ananyo; Ramsay, Michele; Bueno, Maria Rita Passos; Zatz, Mayana; O´Connor, Timothy D.; Pereira, Alexandre C.; Barreto, Mauricio Lima; Costa, Maria Fernanda Lima; Horta, Bernardo L.; Santos, Eduardo Tarazona.

Preprint in English | Fiocruz Preprints | ID: ppf-48002

7.

The genetic structure and adaptation of Andean highlanders and Amazonians are influenced by the interplay between geography and culture.

Borda, Víctor; Alvim, Isabela; Mendes, Marla; Silva-Carvalho, Carolina; Soares-Souza, Giordano B; Leal, Thiago P; Furlan, Vinicius; Scliar, Marilia O; Zamudio, Roxana; Zolini, Camila; Araújo, Gilderlanio S; Luizon, Marcelo R; Padilla, Carlos; Cáceres, Omar; Levano, Kelly; Sánchez, César; Trujillo, Omar; Flores-Villanueva, Pedro O; Dean, Michael; Fuselli, Silvia; Machado, Moara; Romero, Pedro E; Tassi, Francesca; Yeager, Meredith; O'Connor, Timothy D; Gilman, Robert H; Tarazona-Santos, Eduardo; Guio, Heinner.

Proc Natl Acad Sci U S A ; 117(51): 32557-32565, 2020 12 22.

Article in English | MEDLINE | ID: mdl-33277433

ABSTRACT

Western South America was one of the worldwide cradles of civilization. The well-known Inca Empire was the tip of the iceberg of an evolutionary process that started 11,000 to 14,000 years ago. Genetic data from 18 Peruvian populations reveal the following: 1) The between-population homogenization of the central southern Andes and its differentiation with respect to Amazonian populations of similar latitudes do not extend northward. Instead, longitudinal gene flow between the northern coast of Peru, Andes, and Amazonia accompanied cultural and socioeconomic interactions revealed by archeology. This pattern recapitulates the environmental and cultural differentiation between the fertile north, where altitudes are lower, and the arid south, where the Andes are higher, acting as a genetic barrier between the sharply different environments of the Andes and Amazonia. 2) The genetic homogenization between the populations of the arid Andes is not only due to migrations during the Inca Empire or the subsequent colonial period. It started at least during the earlier expansion of the Wari Empire (600 to 1,000 years before present). 3) This demographic history allowed for cases of positive natural selection in the high and arid Andes vs. the low Amazon tropical forest: in the Andes, a putative enhancer in HAND2-AS1 (heart and neural crest derivatives expressed 2 antisense RNA1, a noncoding gene related to cardiovascular function) and rs269868-C/Ser1067 in DUOX2 (dual oxidase 2, related to thyroid function and innate immunity) genes and, in the Amazon, the gene encoding for the CD45 protein, essential for antigen recognition by T and B lymphocytes in viral-host interaction.

Subject(s)

Adaptation, Physiological/genetics , Indians, South American/genetics , Altitude , Civilization , Climate , Dual Oxidases/genetics , Gene Flow , Gene Frequency , Genetics, Population , Humans , Leukocyte Common Antigens/genetics , Peru/ethnology , Polymorphism, Single Nucleotide , RNA, Long Noncoding/genetics , Rainforest , Selection, Genetic , Socioeconomic Factors , T-Box Domain Proteins/genetics

8.

XAF1 as a modifier of p53 function and cancer susceptibility.

Pinto, Emilia M; Figueiredo, Bonald C; Chen, Wenan; Galvao, Henrique C R; Formiga, Maria Nirvana; Fragoso, Maria Candida B V; Ashton-Prolla, Patricia; Ribeiro, Enilze M S F; Felix, Gabriela; Costa, Tatiana E B; Savage, Sharon A; Yeager, Meredith; Palmero, Edenir I; Volc, Sahlua; Salvador, Hector; Fuster-Soler, Jose Luis; Lavarino, Cinzia; Chantada, Guillermo; Vaur, Dominique; Odone-Filho, Vicente; Brugières, Laurence; Else, Tobias; Stoffel, Elena M; Maxwell, Kara N; Achatz, Maria Isabel; Kowalski, Luis; de Andrade, Kelvin C; Pappo, Alberto; Letouze, Eric; Latronico, Ana Claudia; Mendonca, Berenice B; Almeida, Madson Q; Brondani, Vania B; Bittar, Camila M; Soares, Emerson W S; Mathias, Carolina; Ramos, Cintia R N; Machado, Moara; Zhou, Weiyin; Jones, Kristine; Vogt, Aurelie; Klincha, Payal P; Santiago, Karina M; Komechen, Heloisa; Paraizo, Mariana M; Parise, Ivy Z S; Hamilton, Kayla V; Wang, Jinling; Rampersaud, Evadnie; Clay, Michael R.

Sci Adv ; 6(26): eaba3231, 2020 06.

Article in English | MEDLINE | ID: mdl-32637605

ABSTRACT

Cancer risk is highly variable in carriers of the common TP53-R337H founder allele, possibly due to the influence of modifier genes. Whole-genome sequencing identified a variant in the tumor suppressor XAF1 (E134*/Glu134Ter/rs146752602) in a subset of R337H carriers. Haplotype-defining variants were verified in 203 patients with cancer, 582 relatives, and 42,438 newborns. The compound mutant haplotype was enriched in patients with cancer, conferring risk for sarcoma (P = 0.003) and subsequent malignancies (P = 0.006). Functional analyses demonstrated that wild-type XAF1 enhances transactivation of wild-type and hypomorphic TP53 variants, whereas XAF1-E134* is markedly attenuated in this activity. We propose that cosegregation of XAF1-E134* and TP53-R337H mutations leads to a more aggressive cancer phenotype than TP53-R337H alone, with implications for genetic counseling and clinical management of hypomorphic TP53 mutant carriers.

9.

Origins, Admixture Dynamics, and Homogenization of the African Gene Pool in the Americas.

Gouveia, Mateus H; Borda, Victor; Leal, Thiago P; Moreira, Rennan G; Bergen, Andrew W; Kehdy, Fernanda S G; Alvim, Isabela; Aquino, Marla M; Araujo, Gilderlanio S; Araujo, Nathalia M; Furlan, Vinicius; Liboredo, Raquel; Machado, Moara; Magalhaes, Wagner C S; Michelin, Lucas A; Rodrigues, Maíra R; Rodrigues-Soares, Fernanda; Sant Anna, Hanaisa P; Santolalla, Meddly L; Scliar, Marília O; Soares-Souza, Giordano; Zamudio, Roxana; Zolini, Camila; Bortolini, Maria Catira; Dean, Michael; Gilman, Robert H; Guio, Heinner; Rocha, Jorge; Pereira, Alexandre C; Barreto, Mauricio L; Horta, Bernardo L; Lima-Costa, Maria F; Mbulaiteye, Sam M; Chanock, Stephen J; Tishkoff, Sarah A; Yeager, Meredith; Tarazona-Santos, Eduardo.

Mol Biol Evol ; 37(6): 1647-1656, 2020 06 01.

Article in English | MEDLINE | ID: mdl-32128591

ABSTRACT

The Transatlantic Slave Trade transported more than 9 million Africans to the Americas between the early 16th and the mid-19th centuries. We performed a genome-wide analysis using 6,267 individuals from 25 populations to infer how different African groups contributed to North-, South-American, and Caribbean populations, in the context of geographic and geopolitical factors, and compared genetic data with demographic history records of the Transatlantic Slave Trade. We observed that West-Central Africa and Western Africa-associated ancestry clusters are more prevalent in northern latitudes of the Americas, whereas the South/East Africa-associated ancestry cluster is more prevalent in southern latitudes of the Americas. This pattern results from geographic and geopolitical factors leading to population differentiation. However, there is a substantial decrease in the between-population differentiation of the African gene pool within the Americas, when compared with the regions of origin from Africa, underscoring the importance of historical factors favoring admixture between individuals with different African origins in the New World. This between-population homogenization in the Americas is consistent with the excess of West-Central Africa ancestry (the most prevalent in the Americas) in the United States and Southeast-Brazil, with respect to historical-demography expectations. We also inferred that in most of the Americas, intercontinental admixture intensification occurred between 1750 and 1850, which correlates strongly with the peak of arrivals from Africa. This study contributes with a population genetics perspective to the ongoing social, cultural, and political debate regarding ancestry, admixture, and the mestizaje process in the Americas.

Subject(s)

Black People/genetics , Enslavement/history , Gene Pool , Genome, Human , Human Migration/history , Africa , Americas , History, 16th Century , History, 17th Century , History, 18th Century , History, 19th Century , Humans , Phylogeography

10.

Population genetics of immune-related multilocus copy number variation in Native Americans.

Zuccherato, Luciana W; Schneider, Silvana; Tarazona-Santos, Eduardo; Hardwick, Robert J; Berg, Douglas E; Bogle, Helen; Gouveia, Mateus H; Machado, Lee R; Machado, Moara; Rodrigues-Soares, Fernanda; Soares-Souza, Giordano B; Togni, Diego L; Zamudio, Roxana; Gilman, Robert H; Duarte, Denise; Hollox, Edward J; Rodrigues, Maíra R.

J R Soc Interface ; 14(128)2017 03.

Article in English | MEDLINE | ID: mdl-28356540

ABSTRACT

While multiallelic copy number variation (mCNV) loci are a major component of genomic variation, quantifying the individual copy number of a locus and defining genotypes is challenging. Few methods exist to study how mCNV genetic diversity is apportioned within and between populations (i.e. to define the population genetic structure of mCNV). These inferences are critical in populations with a small effective size, such as Amerindians, that may not fit the Hardy-Weinberg model due to inbreeding, assortative mating, population subdivision, natural selection or a combination of these evolutionary factors. We propose a likelihood-based method that simultaneously infers mCNV allele frequencies and the population structure parameter f, which quantifies the departure of homozygosity from the Hardy-Weinberg expectation. This method is implemented in the freely available software CNVice, which also infers individual genotypes using information from both the population and from trios, if available. We studied the population genetics of five immune-related mCNV loci associated with complex diseases (beta-defensins, CCL3L1/CCL4L1, FCGR3A, FCGR3B and FCGR2C) in 12 traditional Native American populations and found that the population structure parameters inferred for these mCNVs are comparable to but lower than those for single nucleotide polymorphisms studied in the same populations.

Subject(s)

Alleles , Gene Frequency/immunology , Genetic Loci/immunology , Models, Genetic , Polymorphism, Single Nucleotide , Female , Genetics, Population , Humans , Indians, South American , Male , Multilocus Sequence Typing , Peru

11.

Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia.

Berndt, Sonja I; Camp, Nicola J; Skibola, Christine F; Vijai, Joseph; Wang, Zhaoming; Gu, Jian; Nieters, Alexandra; Kelly, Rachel S; Smedby, Karin E; Monnereau, Alain; Cozen, Wendy; Cox, Angela; Wang, Sophia S; Lan, Qing; Teras, Lauren R; Machado, Moara; Yeager, Meredith; Brooks-Wilson, Angela R; Hartge, Patricia; Purdue, Mark P; Birmann, Brenda M; Vajdic, Claire M; Cocco, Pierluigi; Zhang, Yawei; Giles, Graham G; Zeleniuch-Jacquotte, Anne; Lawrence, Charles; Montalvan, Rebecca; Burdett, Laurie; Hutchinson, Amy; Ye, Yuanqing; Call, Timothy G; Shanafelt, Tait D; Novak, Anne J; Kay, Neil E; Liebow, Mark; Cunningham, Julie M; Allmer, Cristine; Hjalgrim, Henrik; Adami, Hans-Olov; Melbye, Mads; Glimelius, Bengt; Chang, Ellen T; Glenn, Martha; Curtin, Karen; Cannon-Albright, Lisa A; Diver, W Ryan; Link, Brian K; Weiner, George J; Conde, Lucia.

Nat Commun ; 7: 10933, 2016 Mar 09.

Article in English | MEDLINE | ID: mdl-26956414

ABSTRACT

Chronic lymphocytic leukemia (CLL) is a common lymphoid malignancy with strong heritability. To further understand the genetic susceptibility for CLL and identify common loci associated with risk, we conducted a meta-analysis of four genome-wide association studies (GWAS) composed of 3,100 cases and 7,667 controls with follow-up replication in 1,958 cases and 5,530 controls. Here we report three new loci at 3p24.1 (rs9880772, EOMES, P=2.55 × 10(-11)), 6p25.2 (rs73718779, SERPINB6, P=1.97 × 10(-8)) and 3q28 (rs9815073, LPP, P=3.62 × 10(-8)), as well as a new independent SNP at the known 2q13 locus (rs9308731, BCL2L11, P=1.00 × 10(-11)) in the combined analysis. We find suggestive evidence (P<5 × 10(-7)) for two additional new loci at 4q24 (rs10028805, BANK1, P=7.19 × 10(-8)) and 3p22.2 (rs1274963, CSRNP1, P=2.12 × 10(-7)). Pathway analyses of new and known CLL loci consistently show a strong role for apoptosis, providing further evidence for the importance of this biological pathway in CLL susceptibility.

Subject(s)

Genome-Wide Association Study , Leukemia, Lymphocytic, Chronic, B-Cell/genetics , White People/genetics , Adaptor Proteins, Signal Transducing/genetics , Apoptosis Regulatory Proteins/genetics , Bcl-2-Like Protein 11 , Genetic Predisposition to Disease , Humans , Membrane Proteins/genetics , Polymorphism, Single Nucleotide , Proto-Oncogene Proteins/genetics , Serpins/genetics , T-Box Domain Proteins/genetics

12.

A minimum set of ancestry informative markers for determining admixture proportions in a mixed American population: the Brazilian set.

Santos, Hadassa C; Horimoto, Andréa V R; Tarazona-Santos, Eduardo; Rodrigues-Soares, Fernanda; Barreto, Mauricio L; Horta, Bernardo L; Lima-Costa, Maria F; Gouveia, Mateus H; Machado, Moara; Silva, Thiago M; Sanches, José M; Esteban, Nubia; Magalhaes, Wagner C S; Rodrigues, Maíra R; Kehdy, Fernanda S G; Pereira, Alexandre C.

Eur J Hum Genet ; 24(5): 725-31, 2016 May.

Article in English | MEDLINE | ID: mdl-26395555

ABSTRACT

The Brazilian population is considered to be highly admixed. The main contributing ancestral populations were European and African, with Amerindians contributing to a lesser extent. The aims of this study were to provide a resource for determining and quantifying individual continental ancestry using the smallest number of SNPs possible, thus allowing for a cost- and time-efficient strategy for genomic ancestry determination. We identified and validated a minimum set of 192 ancestry informative markers (AIMs) for the genetic ancestry determination of Brazilian populations. These markers were selected on the basis of their distribution throughout the human genome, and their capacity of being genotyped on widely available commercial platforms. We analyzed genotyping data from 6487 individuals belonging to three Brazilian cohorts. Estimates of individual admixture using this 192 AIM panels were highly correlated with estimates using ~370 000 genome-wide SNPs: 91%, 92%, and 74% of, respectively, African, European, and Native American ancestry components. Besides that, 192 AIMs are well distributed among populations from these ancestral continents, allowing greater freedom in future studies with this panel regarding the choice of reference populations. We also observed that genetic ancestry inferred by AIMs provides similar association results to the one obtained using ancestry inferred by genomic data (370 K SNPs) in a simple regression model with rs1426654, related to skin pigmentation, genotypes as dependent variable. In conclusion, these markers can be used to identify and accurately quantify ancestry of Latin Americans or US Hispanics/Latino individuals, in particular in the context of fine-mapping strategies that require the quantification of continental ancestry in thousands of individuals.

Subject(s)

Genome, Human , Polymorphism, Single Nucleotide , Population/genetics , American Indian or Alaska Native , Black People , Brazil , Genetic Markers , Humans , Pedigree , Skin Pigmentation/genetics , White People

13.

Deep sequencing of HPV16 genomes: A new high-throughput tool for exploring the carcinogenicity and natural history of HPV16 infection.

Cullen, Michael; Boland, Joseph F; Schiffman, Mark; Zhang, Xijun; Wentzensen, Nicolas; Yang, Qi; Chen, Zigui; Yu, Kai; Mitchell, Jason; Roberson, David; Bass, Sara; Burdette, Laurie; Machado, Moara; Ravichandran, Sarangan; Luke, Brian; Machiela, Mitchell J; Andersen, Mark; Osentoski, Matt; Laptewicz, Michael; Wacholder, Sholom; Feldman, Ashlie; Raine-Bennett, Tina; Lorey, Thomas; Castle, Philip E; Yeager, Meredith; Burk, Robert D; Mirabello, Lisa.

Papillomavirus Res ; 1: 3-11, 2015 Dec 01.

Article in English | MEDLINE | ID: mdl-26645052

ABSTRACT

For unknown reasons, there is huge variability in risk conferred by different HPV types and, remarkably, strong differences even between closely related variant lineages within each type. HPV16 is a uniquely powerful carcinogenic type, causing approximately half of cervical cancer and most other HPV-related cancers. To permit the large-scale study of HPV genome variability and precancer/cancer, starting with HPV16 and cervical cancer, we developed a high-throughput next-generation sequencing (NGS) whole-genome method. We designed a custom HPV16 AmpliSeq™ panel that generated 47 overlapping amplicons covering 99% of the genome sequenced on the Ion Torrent Proton platform. After validating with Sanger, the current "gold standard" of sequencing, in 89 specimens with concordance of 99.9%, we used our NGS method and custom annotation pipeline to sequence 796 HPV16-positive exfoliated cervical cell specimens. The median completion rate per sample was 98.0%. Our method enabled us to discover novel SNPs, large contiguous deletions suggestive of viral integration (OR of 27.3, 95% CI 3.3-222, P=0.002), and the sensitive detection of variant lineage coinfections. This method represents an innovative high-throughput, ultra-deep coverage technique for HPV genomic sequencing, which, in turn, enables the investigation of the role of genetic variation in HPV epidemiology and carcinogenesis.

14.

Origin and dynamics of admixture in Brazilians and its effect on the pattern of deleterious mutations.

Kehdy, Fernanda S G; Gouveia, Mateus H; Machado, Moara; Magalhães, Wagner C S; Horimoto, Andrea R; Horta, Bernardo L; Moreira, Rennan G; Leal, Thiago P; Scliar, Marilia O; Soares-Souza, Giordano B; Rodrigues-Soares, Fernanda; Araújo, Gilderlanio S; Zamudio, Roxana; Sant Anna, Hanaisa P; Santos, Hadassa C; Duarte, Nubia E; Fiaccone, Rosemeire L; Figueiredo, Camila A; Silva, Thiago M; Costa, Gustavo N O; Beleza, Sandra; Berg, Douglas E; Cabrera, Lilia; Debortoli, Guilherme; Duarte, Denise; Ghirotto, Silvia; Gilman, Robert H; Gonçalves, Vanessa F; Marrero, Andrea R; Muniz, Yara C; Weissensteiner, Hansi; Yeager, Meredith; Rodrigues, Laura C; Barreto, Mauricio L; Lima-Costa, M Fernanda; Pereira, Alexandre C; Rodrigues, Maíra R; Tarazona-Santos, Eduardo.

Proc Natl Acad Sci U S A ; 112(28): 8696-701, 2015 Jul 14.

Article in English | MEDLINE | ID: mdl-26124090

ABSTRACT

While South Americans are underrepresented in human genomic diversity studies, Brazil has been a classical model for population genetics studies on admixture. We present the results of the EPIGEN Brazil Initiative, the most comprehensive up-to-date genomic analysis of any Latin-American population. A population-based genome-wide analysis of 6,487 individuals was performed in the context of worldwide genomic diversity to elucidate how ancestry, kinship, and inbreeding interact in three populations with different histories from the Northeast (African ancestry: 50%), Southeast, and South (both with European ancestry >70%) of Brazil. We showed that ancestry-positive assortative mating permeated Brazilian history. We traced European ancestry in the Southeast/South to a wider European/Middle Eastern region with respect to the Northeast, where ancestry seems restricted to Iberia. By developing an approximate Bayesian computation framework, we infer more recent European immigration to the Southeast/South than to the Northeast. Also, the observed low Native-American ancestry (6-8%) was mostly introduced in different regions of Brazil soon after the European Conquest. We broadened our understanding of the African diaspora, the major destination of which was Brazil, by revealing that Brazilians display two within-Africa ancestry components: one associated with non-Bantu/western Africans (more evident in the Northeast and African Americans) and one associated with Bantu/eastern Africans (more present in the Southeast/South). Furthermore, the whole-genome analysis of 30 individuals (42-fold deep coverage) shows that continental admixture rather than local post-Columbian history is the main and complex determinant of the individual amount of deleterious genotypes.

Subject(s)

Genetics, Population , Mutation , Black People/genetics , Brazil , Humans , White People/genetics

15.

Evolutionary dynamics of the human NADPH oxidase genes CYBB, CYBA, NCF2, and NCF4: functional implications.

Tarazona-Santos, Eduardo; Machado, Moara; Magalhães, Wagner C S; Chen, Renee; Lyon, Fernanda; Burdett, Laurie; Crenshaw, Andrew; Fabbri, Cristina; Pereira, Latife; Pinto, Laelia; Redondo, Rodrigo A F; Sestanovich, Ben; Yeager, Meredith; Chanock, Stephen J.

Mol Biol Evol ; 30(9): 2157-67, 2013 Sep.

Article in English | MEDLINE | ID: mdl-23821607

ABSTRACT

The phagocyte NADPH oxidase catalyzes the reduction of O2 to reactive oxygen species with microbicidal activity. It is composed of two membrane-spanning subunits, gp91-phox and p22-phox (encoded by CYBB and CYBA, respectively), and three cytoplasmic subunits, p40-phox, p47-phox, and p67-phox (encoded by NCF4, NCF1, and NCF2, respectively). Mutations in any of these genes can result in chronic granulomatous disease, a primary immunodeficiency characterized by recurrent infections. Using evolutionary mapping, we determined that episodes of adaptive natural selection have shaped the extracellular portion of gp91-phox during the evolution of mammals, which suggests that this region may have a function in host-pathogen interactions. On the basis of a resequencing analysis of approximately 35 kb of CYBB, CYBA, NCF2, and NCF4 in 102 ethnically diverse individuals (24 of African ancestry, 31 of European ancestry, 24 of Asian/Oceanians, and 23 US Hispanics), we show that the pattern of CYBA diversity is compatible with balancing natural selection, perhaps mediated by catalase-positive pathogens. NCF2 in Asian populations shows a pattern of diversity characterized by a differentiated haplotype structure. Our study provides insight into the role of pathogen-driven natural selection in an innate immune pathway and sheds light on the role of CYBA in endothelial, nonphagocytic NADPH oxidases, which are relevant in the pathogenesis of cardiovascular and other complex diseases.

Subject(s)

Bacterial Infections/genetics , Granulomatous Disease, Chronic/genetics , Membrane Glycoproteins/genetics , NADPH Oxidases/genetics , Amino Acid Sequence , Animals , Asian People , Bacteria/enzymology , Bacterial Infections/complications , Bacterial Infections/enzymology , Bacterial Infections/ethnology , Bacterial Proteins/metabolism , Black People , Catalase/metabolism , Evolution, Molecular , Genetic Variation , Granulomatous Disease, Chronic/complications , Granulomatous Disease, Chronic/enzymology , Granulomatous Disease, Chronic/ethnology , Haplotypes , Host-Pathogen Interactions , Humans , Membrane Glycoproteins/classification , Molecular Sequence Data , Mutation , NADPH Oxidase 2 , NADPH Oxidases/classification , Phylogeny , Selection, Genetic , White People

16.

Genetic interaction between NAT2, GSTM1, GSTT1, CYP2E1, and environmental factors is associated with adverse reactions to anti-tuberculosis drugs.

Costa, Gustavo N O; Magno, Luiz A V; Santana, Cinthia V N; Konstantinovas, Cibele; Saito, Samuel T; Machado, Moara; Di Pietro, Giuliano; Bastos-Rodrigues, Luciana; Miranda, Débora M; De Marco, Luiz A; Romano-Silva, Marco A; Rios-Santos, Fabrício.

Mol Diagn Ther ; 16(4): 241-50, 2012 Aug 01.

Article in English | MEDLINE | ID: mdl-22788240

ABSTRACT

BACKGROUND: Adverse drug reactions (ADRs) associated with anti-tuberculosis (anti-TB) drug regimens have considerable impact on anti-TB treatment, potentially leading to unsuccessful outcomes. Nevertheless, the risk factors that play a role in anti-TB drug-induced ADRs are not well established. It is well documented that genetic polymorphisms in drug-metabolizing enzymes (DMEs) result in considerably complex variability in anti-TB drug disposition. In addition, the impact of pharmacogenetic variation on the metabolism of anti-TB drugs may be modifiable by environmental exposure. Thus, an assessment of pharmacogenetic variability combined with biomarkers of environmental exposure may be helpful for demonstrating the effect of the gene-environment interaction on susceptibility to ADRs induced by anti-TB drug therapy. OBJECTIVE: The aim of the study was to investigate the impact of the interaction between environmental risk factors and pharmacogenetic polymorphisms in four common DMEs--N-acetyltransferase 2 (arylamine N-acetyltransferase) [NAT2], glutathione S-transferase theta 1 [GSTT1], glutathione S-transferase mu 1 [GSTM1], and cytochrome P450 2E1 [CYP2E1]--on commonly reported ADRs to first-line anti-TB drugs in 129 patients receiving homogeneous TB treatment. METHODS: TB patients monitored during drug treatment were divided into subgroups according to the presence or absence of ADRs. Additionally, the patients' clinical and demographic characteristics were collected in order to identify the environmental factors that are potential triggers for ADRs induced by anti-TB drug treatment. Pharmacogenetic variability was determined by gene sequencing, TaqMan® assays, or polymerase chain reaction. RESULTS: The findings of this study suggest that the NAT2 slow acetylator haplotype, female sex, and smoking are important determinants of susceptibility to ADRs induced by anti-TB drugs. Patients carrying multiple, but not single, polymorphisms in the NAT2, GSTM1, GSTT1, and CYP2E1 genes were found to have an increased risk of ADRs, as revealed by gene-gene interaction analysis. Moreover, we also identified meaningful gene-environment interaction models that resulted in the highest levels of ADR risk. CONCLUSION: The study findings provide evidence of the clinical impact of the interaction between pharmacogenetic variability and environmental factors on ADRs induced by anti-TB drug therapy. Predictive pharmacogenetic testing and a comprehensive clinical history would therefore be helpful for identification and careful monitoring of patients at high risk of this complication.

Subject(s)

Antitubercular Agents/adverse effects , Arylamine N-Acetyltransferase/genetics , Cytochrome P-450 CYP2E1/genetics , Glutathione Transferase/genetics , Tuberculosis/drug therapy , Tuberculosis/genetics , Adolescent , Adult , Aged , Aged, 80 and over , Antitubercular Agents/therapeutic use , Female , Genetic Predisposition to Disease/genetics , Genotype , Haplotypes , Humans , Male , Middle Aged , Polymorphism, Genetic , Risk Factors , Young Adult

17.

A graph-based approach for designing extensible pipelines.

Rodrigues, Maíra R; Magalhães, Wagner C S; Machado, Moara; Tarazona-Santos, Eduardo.

BMC Bioinformatics ; 13: 163, 2012 Jul 12.

Article in English | MEDLINE | ID: mdl-22788675

ABSTRACT

BACKGROUND: In bioinformatics, it is important to build extensible and low-maintenance systems that are able to deal with the new tools and data formats that are constantly being developed. The traditional and simplest implementation of pipelines involves hardcoding the execution steps into programs or scripts. This approach can lead to problems when a pipeline is expanding because the incorporation of new tools is often error prone and time consuming. Current approaches to pipeline development such as workflow management systems focus on analysis tasks that are systematically repeated without significant changes in their course of execution, such as genome annotation. However, more dynamism on the pipeline composition is necessary when each execution requires a different combination of steps. RESULTS: We propose a graph-based approach to implement extensible and low-maintenance pipelines that is suitable for pipeline applications with multiple functionalities that require different combinations of steps in each execution. Here pipelines are composed automatically by compiling a specialised set of tools on demand, depending on the functionality required, instead of specifying every sequence of tools in advance. We represent the connectivity of pipeline components with a directed graph in which components are the graph edges, their inputs and outputs are the graph nodes, and the paths through the graph are pipelines. To that end, we developed special data structures and a pipeline system algorithm. We demonstrate the applicability of our approach by implementing a format conversion pipeline for the fields of population genetics and genetic epidemiology, but our approach is also helpful in other fields where the use of multiple software is necessary to perform comprehensive analyses, such as gene expression and proteomics analyses. The project code, documentation and the Java executables are available under an open source license at http://code.google.com/p/dynamic-pipeline. The system has been tested on Linux and Windows platforms. CONCLUSIONS: Our graph-based approach enables the automatic creation of pipelines by compiling a specialised set of tools on demand, depending on the functionality required. It also allows the implementation of extensible and low-maintenance pipelines and contributes towards consolidating openness and collaboration in bioinformatics systems. It is targeted at pipeline developers and is suited for implementing applications with sequential execution steps and combined functionalities. In the format conversion application, the automatic combination of conversion tools increased both the number of possible conversions available to the user and the extensibility of the system to allow for future updates with new file formats.

Subject(s)

Computational Biology/methods , Software , Algorithms , Genome , Proteomics , Workflow

18.

Population genetics of GYPB and association study between GYPB*S/s polymorphism and susceptibility to P. falciparum infection in the Brazilian Amazon.

Tarazona-Santos, Eduardo; Castilho, Lilian; Amaral, Daphne R T; Costa, Daiane C; Furlani, Natália G; Zuccherato, Luciana W; Machado, Moara; Reid, Marion E; Zalis, Mariano G; Rossit, Andréa R; Santos, Sidney E B; Machado, Ricardo L; Lustigman, Sara.

PLoS One ; 6(1): e16123, 2011 Jan 24.

Article in English | MEDLINE | ID: mdl-21283638

ABSTRACT

BACKGROUND: Merozoites of Plasmodium falciparum invade through several pathways using different RBC receptors. Field isolates appear to use a greater variability of these receptors than laboratory isolates. Brazilian field isolates were shown to mostly utilize glycophorin A-independent invasion pathways via glycophorin B (GPB) and/or other receptors. The Brazilian population exhibits extensive polymorphism in blood group antigens, however, no studies have been done to relate the prevalence of the antigens that function as receptors for P. falciparum and the ability of the parasite to invade. Our study aimed to establish whether variation in the GYPB*S/s alleles influences susceptibility to infection with P. falciparum in the admixed population of Brazil. METHODS: Two groups of Brazilian Amazonians from Porto Velho were studied: P. falciparum infected individuals (cases); and uninfected individuals who were born and/or have lived in the same endemic region for over ten years, were exposed to infection but have not had malaria over the study period (controls). The GPB Ss phenotype and GYPB*S/s alleles were determined by standard methods. Sixty two Ancestry Informative Markers were genotyped on each individual to estimate admixture and control its potential effect on the association between frequency of GYPB*S and malaria infection. RESULTS: GYPB*S is associated with host susceptibility to infection with P. falciparum; GYPB*S/GYPB*S and GYPB*S/GYPB*s were significantly more prevalent in the in the P. falciparum infected individuals than in the controls (69.87% vs. 49.75%; P<0.02). Moreover, population genetics tests applied on the GYPB exon sequencing data suggest that natural selection shaped the observed pattern of nucleotide diversity. CONCLUSION: Epidemiological and evolutionary approaches suggest an important role for the GPB receptor in RBC invasion by P. falciparum in Brazilian Amazons. Moreover, an increased susceptibility to infection by this parasite is associated with the GPB S+ variant in this population.

Subject(s)

Genetic Predisposition to Disease/genetics , Genetics, Population , Glycophorins/genetics , Malaria, Falciparum/genetics , Plasmodium falciparum/physiology , Polymorphism, Genetic/genetics , Alleles , Brazil/epidemiology , Case-Control Studies , Endemic Diseases , Gene Frequency , Genetic Markers , Humans , Malaria, Falciparum/epidemiology

19.

Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies.

Machado, Moara; Magalhães, Wagner Cs; Sene, Allan; Araújo, Bruno; Faria-Campos, Alessandra C; Chanock, Stephen J; Scott, Leandro; Oliveira, Guilherme; Tarazona-Santos, Eduardo; Rodrigues, Maira R.

Investig Genet ; 2(1): 3, 2011 Feb 01.

Article in English | MEDLINE | ID: mdl-21284835

ABSTRACT

BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data. RESULTS: In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp. CONCLUSION: We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL